Assessing the Reliability of Skills Measured by the SAT®

نویسندگان

  • Maureen Ewing
  • Kristen Huff
  • Melissa Andrews
  • Kinda King
چکیده

Recipients of educational score reports generally welcome the idea of receiving more descriptive feedback about examinee performance than is provided by a total score or a percentile rank indicating overall performance. This is not surprising as descriptive score reports have the potential to aid score users in the development of student-based instructional plans and/or suggest areas for classroom-based instructional intervention. In fact, under the No Child Left Behind Act of 2001 (NCLB), state testing programs are mandated by law to “produce individual student interpretive, descriptive, and diagnostic reports” [section 1111(b)(3)(c)(xii)]. Furthermore, this detailed information must be provided in such a way that the validity and reliability of the scores is maintained [section 1111(b)(3)(c)(iii)]. Due in part to this legislation, there is a general need for descriptive score reports that produce reliable scores and facilitate valid interpretations of student performance. When descriptive score reports provide reliable and valid information, they offer the possibility for improving the consequential validity of test score use and interpretation. For this reason, the College Board is committed to conducting research to identify reliable and valid ways of providing examinees with more descriptive feedback about their test performance. In connection with the new SAT® that was introduced in March 2005, research has been under way to investigate the feasibility of providing examinees with score reports that contain feedback on skills measured by the critical reading, mathematics, and writing sections of the test. Although previous score reports for the SAT have provided cluster scores based on content specifications or item type, such scores do not typically provide great insight into whether an examinee will correctly answer a particular test item (Embretson and Gorin, 2001; Wainer, Sheehan, and Wang, 2000). This is because to meet test form assembly guidelines the content specifications for a particular domain are written to cover a range of difficulty. As a result, there are usually some easy, medium, and difficult items within each domain. An examinee of average ability would be expected to correctly answer the easy items and most of the items of medium difficulty across all content domains. In this situation, feedback based solely on content domains or item type would suggest to the student that he or she needs improvement in all areas, which is not very informative or targeted. To generate feedback that has the potential to be more meaningful, the College Board asked subject matter experts (SMEs), including content specialists, measurement experts, and cognitive psychologists, to specify a set of skill categories hypothesized to underlie performance on each SAT test section (i.e., critical reading, mathematics, and writing). Once the models were hypothesized, items were coded to specific skill categories. Although the models that were generated allowed for the coding of items to multiple skills, the skill category that was primarily involved in solving each item was also noted. One way of providing feedback to examinees is to report skill scores based on the primary skill codes generated by the SMEs, which are described in Table 1. Such skill scores have the potential to be more informative than conventional cluster scores based on content or item type because the skills reflect an underlying model of student performance in the domain.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing and Psychometric Assessing of Physician-Patient Communication Skills Tool

Background: Assessment of physicians’ communication skills with patients is essential to ensure effective treatment. Achieving such a goal requires the use of a valid, native, and culturally-based tool. This study aimed to design a physician-patient communication skills assessment tool and evaluate its validity and reliability among the medical students of Guilan University of Medical Sciences,...

متن کامل

Assessing the Validity and Reliability of the Persian Version of the Interpersonal Problem Solving Skills Assessment Tool in Schizophrenia

Objective: This study aimed to translate the Assessment of Interpersonal Problem-Solving Skills (AIPSS) into Persian and to evaluate the validity and reliability of the Persian version of AIPSS to use for adults with schizophrenia. Materials & Methods: In this methodological study, the translation process was performed according to the International Quality of Life Assessment (IQOLA) protocol....

متن کامل

Providing a Simple Method for the Calculation of the Source and Target Reliabili- ty in a Communication Network (SAT)

The source and target reliability in SAT network is de- fined as the flawless transmission from the source node to all the other nodes. In some references, the SAT pro- cess has been followed between all the node pairs but it is very time-consuming in today’s widespread networks and involves many costs. In this article, a method has been proposed to compare the reliability in complex networks b...

متن کامل

Providing a Simple Method for the Calculation of the Source and Target Reliabili- ty in a Communication Network (SAT)

The source and target reliability in SAT network is de- fined as the flawless transmission from the source node to all the other nodes. In some references, the SAT pro- cess has been followed between all the node pairs but it is very time-consuming in today’s widespread networks and involves many costs. In this article, a method has been proposed to compare the reliability in complex networks b...

متن کامل

Design and Psychometrics of a Questionnaire for Assessing the Performance of Faculty Members in Providing Virtual Education during the Covid-19 Pandemic

Background and Purpose: Given the Covid-19 and immediate investment in providing virtual university education, it is important to measure the performance of faculty members in this area. Accordingly, this study was conducted to design and psychometrics of a questionnaire to assess the performance of faculty members in providing virtual training. Material and Methods: A descriptive study of ins...

متن کامل

Introduction of an objective structured clinical exam on midwifery life saving skills, and examinees satisfaction

Introduction. Training midwives on life saving skills is one of the most important strategies in preventing maternal and neonatal mortality, especially in areas that physicians are absent. The effectiveness of each training program must be evaluated. Objective structured clinical evaluation can be applied to evaluate the learner skill improvement on special tasks with lowest skill evaluation bi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005